📊 Analysis of Historical Water Intake Across Economic Sectors for the North Kazakhstan WaterManagement Basins for 2004-2023¶
In [1]:
##Import reuired libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import openpyxl
In [2]:
# Read the data
df = pd.read_excel("wateruse_masterTable_modified.xlsx", engine='openpyxl')
# Preview the data
df.head()
Out[2]:
| Water management basin | Types of water use | Year | Total water withdrawal | Water intake_Surface (including water received from another country)_Total | Water intake_Surface (including water received from other countries)_Of which sea water | Water intake_Underground and mine_Total | Water intake_Underground and mine_Of which mine-mine | Water intake_Collector and drainage | Water intake_Wastewater | ... | Used_surface_Total | Used_surface_Of which sea | Used_underground_Total | Used_underground_Of which mine-mine | Used_Collector-drainage | Used_Waste | Total water discharge | Water disposal_Incl. to water bodies | Non-returnable water consumption | Transport losses | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Ertis Basin | Household and drinking | 2004-01-01 | 2.297324e+05 | 1.550267e+05 | 0.0 | 74705.726459 | 0.0 | 0.0 | 0.0 | ... | 125394.07 | 0.0 | 53642.72 | 0.0 | 0.0 | 0.0 | 43602.46 | 10856.00 | 2.188764e+05 | 50695.592432 |
| 1 | Ertis Basin | Production facilities | 2004-01-01 | 1.695084e+06 | 1.628900e+06 | 0.0 | 65990.309773 | 1799.4 | 193.0 | 0.0 | ... | 1607784.06 | 0.0 | 61671.49 | 1799.4 | 193.0 | 0.0 | 1525102.94 | 1495091.64 | 1.999919e+05 | 25434.973695 |
| 2 | Ertis Basin | Agriculture, total | 2004-01-01 | 1.037166e+06 | 9.807769e+05 | 0.0 | 56389.100000 | 0.0 | 0.0 | 0.0 | ... | 945741.20 | 0.0 | 48026.10 | 0.0 | 0.0 | 0.0 | 109.00 | 0.00 | 1.037166e+06 | 43398.733873 |
| 3 | Ertis Basin | Agriculture(agricultural water supply) | 2004-01-01 | 2.913300e+04 | 6.797500e+03 | 0.0 | 22335.500000 | 0.0 | 0.0 | 0.0 | ... | 6797.50 | 0.0 | 22331.50 | 0.0 | 0.0 | 0.0 | 109.00 | 0.00 | 2.913300e+04 | 4.000000 |
| 4 | Ertis Basin | Agriculture (regular irrigation) | 2004-01-01 | 1.179127e+05 | 1.141949e+05 | 0.0 | 3717.800000 | 0.0 | 0.0 | 0.0 | ... | 85907.40 | 0.0 | 2591.80 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 1.179127e+05 | 29413.522951 |
5 rows × 21 columns
In [3]:
# Summary of columns and data types
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2300 entries, 0 to 2299 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Water management basin 2300 non-null object 1 Types of water use 2300 non-null object 2 Year 2300 non-null datetime64[ns] 3 Total water withdrawal 2300 non-null float64 4 Water intake_Surface (including water received from another country)_Total 2300 non-null float64 5 Water intake_Surface (including water received from other countries)_Of which sea water 2300 non-null float64 6 Water intake_Underground and mine_Total 2300 non-null float64 7 Water intake_Underground and mine_Of which mine-mine 2300 non-null float64 8 Water intake_Collector and drainage 2300 non-null float64 9 Water intake_Wastewater 2300 non-null float64 10 Total water used 2300 non-null float64 11 Used_surface_Total 2300 non-null float64 12 Used_surface_Of which sea 2300 non-null float64 13 Used_underground_Total 2300 non-null float64 14 Used_underground_Of which mine-mine 2300 non-null float64 15 Used_Collector-drainage 2300 non-null float64 16 Used_Waste 2300 non-null float64 17 Total water discharge 2300 non-null float64 18 Water disposal_Incl. to water bodies 2300 non-null float64 19 Non-returnable water consumption 2300 non-null float64 20 Transport losses 2300 non-null float64 dtypes: datetime64[ns](1), float64(18), object(2) memory usage: 377.5+ KB
In [4]:
# Summary Statistic for all columns numeric clo
df.describe()
Out[4]:
| Year | Total water withdrawal | Water intake_Surface (including water received from another country)_Total | Water intake_Surface (including water received from other countries)_Of which sea water | Water intake_Underground and mine_Total | Water intake_Underground and mine_Of which mine-mine | Water intake_Collector and drainage | Water intake_Wastewater | Total water used | Used_surface_Total | Used_surface_Of which sea | Used_underground_Total | Used_underground_Of which mine-mine | Used_Collector-drainage | Used_Waste | Total water discharge | Water disposal_Incl. to water bodies | Non-returnable water consumption | Transport losses | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 2300 | 2.300000e+03 | 2.300000e+03 | 2.300000e+03 | 2300.000000 | 2300.000000 | 2300.000000 | 2300.000000 | 2.300000e+03 | 2.300000e+03 | 2.300000e+03 | 2300.000000 | 2300.000000 | 2300.000000 | 2300.000000 | 2.300000e+03 | 2.300000e+03 | 2.300000e+03 | 2300.000000 |
| mean | 2013-07-02 06:00:00 | 1.453287e+05 | 1.350105e+05 | 1.813367e+04 | 9602.003336 | 2781.935043 | 3.592957 | 712.599965 | 1.361611e+05 | 1.273044e+05 | 1.801260e+04 | 8164.489539 | 2060.567991 | 3.592957 | 688.622574 | 8.105590e+04 | 7.438588e+04 | 7.205035e+04 | 6419.771255 |
| min | 2004-01-01 00:00:00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 |
| 25% | 2008-10-01 12:00:00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 |
| 50% | 2013-07-02 12:00:00 | 8.971000e+01 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 |
| 75% | 2018-04-02 06:00:00 | 3.676415e+04 | 2.082212e+04 | 0.000000e+00 | 1455.860000 | 0.000000 | 0.000000 | 0.000000 | 3.189211e+04 | 1.670607e+04 | 0.000000e+00 | 1072.315000 | 0.000000 | 0.000000 | 0.000000 | 5.534900e+02 | 0.000000e+00 | 3.162824e+04 | 287.417754 |
| max | 2023-01-01 00:00:00 | 4.064752e+06 | 3.889953e+06 | 1.443854e+06 | 227028.106230 | 94119.700000 | 745.000000 | 106127.500000 | 3.813772e+06 | 3.645357e+06 | 1.443854e+06 | 193283.280000 | 91455.600000 | 745.000000 | 106127.500000 | 2.457958e+06 | 2.357494e+06 | 1.837918e+06 | 267926.650000 |
| std | NaN | 4.946858e+05 | 4.719914e+05 | 1.420952e+05 | 27145.307323 | 10411.970902 | 37.646091 | 5490.519830 | 4.721148e+05 | 4.526619e+05 | 1.413577e+05 | 23235.252312 | 8811.952714 | 37.646091 | 5378.827614 | 3.386502e+05 | 3.252408e+05 | 2.264906e+05 | 21794.368717 |
In [5]:
# Check for missing values
df.isnull().sum()
Out[5]:
Water management basin 0 Types of water use 0 Year 0 Total water withdrawal 0 Water intake_Surface (including water received from another country)_Total 0 Water intake_Surface (including water received from other countries)_Of which sea water 0 Water intake_Underground and mine_Total 0 Water intake_Underground and mine_Of which mine-mine 0 Water intake_Collector and drainage 0 Water intake_Wastewater 0 Total water used 0 Used_surface_Total 0 Used_surface_Of which sea 0 Used_underground_Total 0 Used_underground_Of which mine-mine 0 Used_Collector-drainage 0 Used_Waste 0 Total water discharge 0 Water disposal_Incl. to water bodies 0 Non-returnable water consumption 0 Transport losses 0 dtype: int64
In [6]:
df['Types of water use'].unique()
Out[6]:
array(['Household and drinking', 'Production facilities',
'Agriculture, total', 'Agriculture(agricultural water supply)',
'Agriculture (regular irrigation)',
'Agriculture(estuary irrigation)',
'Agriculture (irrigation of pastures)',
'Agriculture(hayfield bay)', 'Pond and fishing industry',
'Watering of green spaces', 'Channel flushing',
'Maintaining set horizons', 'Filling of bulk reservoirs',
'Maintaining reservoir pressure', 'Other needs',
'Resetting the SDR without using', 'Transit water',
'Forced water intake into canals', 'Transferred to another (BWMA)',
'Transferred to another country',
'Discharge of sewage drainage waters', 'Sanitary permits', 'TOTAL'],
dtype=object)
In [11]:
# Remove rows where "Types of water use" is either 'TOTAL' or 'Agriculture, total' to avoid confusion
df = df[~df['Types of water use'].isin(['TOTAL', 'Agriculture, total'])]
print(df['Types of water use'].unique())
['Household and drinking' 'Production facilities' 'Agriculture(agricultural water supply)' 'Agriculture (regular irrigation)' 'Agriculture(estuary irrigation)' 'Agriculture (irrigation of pastures)' 'Agriculture(hayfield bay)' 'Pond and fishing industry' 'Watering of green spaces' 'Channel flushing' 'Maintaining set horizons' 'Filling of bulk reservoirs' 'Maintaining reservoir pressure' 'Other needs' 'Resetting the SDR without using' 'Transit water' 'Forced water intake into canals' 'Transferred to another (BWMA)' 'Transferred to another country' 'Discharge of sewage drainage waters' 'Sanitary permits']
In [12]:
import pandas as pd
# --- Convert to million m³ ---
df[['Total water withdrawal', 'Total water used', 'Total water discharge',
'Non-returnable water consumption', 'Transport losses']] /= 1_000_000
# # --- Display full numbers (no scientific notation) ---
# pd.set_option('display.float_format', '{:,.2f}'.format)
# --- Print selected columns for preview ---
print(df[['Water management basin',
'Total water withdrawal',
'Total water used',
'Total water discharge',
'Non-returnable water consumption',
'Transport losses']])
Water management basin Total water withdrawal Total water used \
0 Ertis Basin 0.229732 0.179037
1 Ertis Basin 1.695084 1.669649
3 Ertis Basin 0.029133 0.029129
4 Ertis Basin 0.117913 0.088499
5 Ertis Basin 0.176604 0.169855
... ... ... ...
2294 Tobol-Torgai 0.000000 0.000000
2295 Tobol-Torgai 0.000000 0.000000
2296 Tobol-Torgai 0.000000 0.000000
2297 Tobol-Torgai 0.000000 0.000000
2298 Tobol-Torgai 0.000000 0.000000
Total water discharge Non-returnable water consumption \
0 0.043602 0.218876
1 1.525103 0.199992
3 0.000109 0.029133
4 0.000000 0.117913
5 0.000000 0.176604
... ... ...
2294 0.000000 0.000000
2295 0.000000 0.000000
2296 0.000000 0.000000
2297 0.000000 0.000000
2298 0.000000 0.000000
Transport losses
0 0.050696
1 0.025435
3 0.000004
4 0.029414
5 0.006748
... ...
2294 0.000000
2295 0.000000
2296 0.000000
2297 0.000000
2298 0.000000
[2100 rows x 6 columns]
In order to see Trends over total water withdrawal, used, and discharge over time by basin¶
In [15]:
import textwrap
# --- Step 1: Group and convert to million m³ ---
grouped = df.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
# --- Step 2: Summary Table ---
pivoted = grouped.pivot(index='Year', columns='Water management basin')
pivoted.columns = [' '.join(col).strip() for col in pivoted.columns.values]
summary_all = pivoted.agg(['mean', 'min', 'max'])
summary_table = summary_all.T.reset_index()
summary_table.columns = ['Indicator', 'Average (mln m³)', 'Min (mln m³)', 'Max (mln m³)']
summary_table[['Variable', 'Water Management Basin']] = summary_table['Indicator'].str.extract(r'(.*) (.+)$')
summary_table = summary_table[['Variable', 'Water Management Basin', 'Average (mln m³)', 'Min (mln m³)', 'Max (mln m³)']]
summary_table[['Average (mln m³)', 'Min (mln m³)', 'Max (mln m³)']] = summary_table[
['Average (mln m³)', 'Min (mln m³)', 'Max (mln m³)']
].round(2)
print (summary_table)
# --- Step 3: Export Summary Table ---
summary_table.to_excel("summary_statistics.xlsx", index=False)
summary_table.to_csv("summary_statistics.csv", index=False)
# --- Step 4: Optional Boxplot Selection ---
plot_variable = 'Total water discharge' # Change to 'Total water used' or 'Total water discharge'
palette_map = {
'Total water withdrawal': 'Reds',
'Total water used': 'Greens',
'Total water discharge': 'Blues'
}
# Plot
plt.figure(figsize=(12, 6))
ax = sns.boxplot(
data=grouped,
x='Water management basin',
y=plot_variable,
palette=palette_map.get(plot_variable, 'Set2')
)
ax.set_title(f'Distribution of {plot_variable} by Basin (mln m³)')
ax.set_ylabel(f'{plot_variable} (mln m³)')
# Wrap and rotate labels
labels = [textwrap.fill(label.get_text(), 15) for label in ax.get_xticklabels()]
ax.set_xticklabels(labels, rotation=30, ha='right')
plt.tight_layout()
# --- Step 5: Export Plot ---
plot_filename = plot_variable.lower().replace(" ", "_") + "_boxplot.png"
plt.savefig(plot_filename, dpi=300)
plt.show()
Variable Water Management Basin Average (mln m³) \
0 Total water withdrawal Ertis Basin 3.60
1 Total water withdrawal Esil Basin 0.29
2 Total water withdrawal Nura-Sarysu 1.52
3 Total water withdrawal Tobol-Torgai 0.14
4 Total water withdrawal Zhayik-Caspian 2.14
5 Total water used Ertis Basin 3.38
6 Total water used Esil Basin 0.26
7 Total water used Nura-Sarysu 1.45
8 Total water used Tobol-Torgai 0.11
9 Total water used Zhayik-Caspian 1.99
10 Total water discharge Ertis Basin 2.13
11 Total water discharge Esil Basin 0.11
12 Total water discharge Nura-Sarysu 1.24
13 Total water discharge Tobol-Torgai 0.07
14 Total water discharge Zhayik-Caspian 1.10
Min (mln m³) Max (mln m³)
0 3.06 4.06
1 0.20 0.46
2 1.28 1.81
3 0.10 0.17
4 0.31 3.15
5 2.77 3.81
6 0.16 0.43
7 1.18 1.72
8 0.08 0.16
9 0.26 2.94
10 1.70 2.46
11 0.04 0.36
12 1.01 1.49
13 0.01 0.11
14 0.05 1.47
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\3533368880.py:39: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. ax = sns.boxplot( C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\3533368880.py:50: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator. ax.set_xticklabels(labels, rotation=30, ha='right')
In [16]:
pivoted = grouped.pivot(index='Year', columns='Water management basin')
# Set seaborn style directly
sns.set_style("whitegrid")
# Plot helper function
def plot_time_series(df, title, ylabel):
plt.figure(figsize=(12, 6))
sns.lineplot(data=df)
plt.title(title, fontsize=14)
plt.xlabel('Year')
plt.ylabel(ylabel)
plt.xticks(rotation=45)
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
# Now plot
plot_time_series(pivoted['Total water withdrawal'], 'Total Water Withdrawal by Basin Over Time', 'Withdrawal (mln m3)')
plot_time_series(pivoted['Total water used'], 'Total Water Used by Basin Over Time', 'Water Used (mln m3)')
plot_time_series(pivoted['Total water discharge'], 'Total Water Discharge by Basin Over Time', 'Discharge (mln m3)')
Calculating and plotting a 3-year rolling average of withdrawal, use, and discharge BY BASIN¶
In [17]:
# Calculate 3-year rolling averages
rolling_withdrawal = pivoted['Total water withdrawal'].rolling(window=3, min_periods=1).mean()
rolling_used = pivoted['Total water used'].rolling(window=3, min_periods=1).mean()
rolling_discharge = pivoted['Total water discharge'].rolling(window=3, min_periods=1).mean()
# Helper plot function
def plot_rolling(data, title, ylabel):
plt.figure(figsize=(12, 6))
sns.lineplot(data=data)
plt.title(title, fontsize=14)
plt.xlabel('Year')
plt.ylabel(ylabel)
plt.xticks(rotation=45)
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Plot each one
plot_rolling(rolling_withdrawal, 'Rolling Average: Water Withdrawal by Basin', 'Withdrawal (mln m³)')
plot_rolling(rolling_used, 'Rolling Average: Water Used by Basin', 'Water Used (mln m³)')
plot_rolling(rolling_discharge, 'Rolling Average: Water Discharge by Basin', 'Discharge (mln m³)')
Analyze As well "Non-returnable water consumption" and "Transport losses" for all basins by years¶
In [23]:
# --- Step 2: Group data by Year and Basin ---
grouped_2 = df.groupby(['Year', 'Water management basin'])[
['Non-returnable water consumption', 'Transport losses']
].sum().reset_index()
# # --- Step 3: Print grouped table ---
# print("Grouped Data:")
# print(grouped_2)
# --- Step 4: Summary statistics per Basin ---
summary_stats = grouped_2.groupby('Water management basin')[
['Non-returnable water consumption', 'Transport losses']
].agg(['mean', 'min', 'max']).round(2)
# Flatten MultiIndex columns
summary_stats.columns = ['_'.join(col) for col in summary_stats.columns]
summary_stats = summary_stats.reset_index()
print("\nSummary Statistics per Basin:")
print(summary_stats)
# --- Step 5 (Optional): Export results ---
grouped_2.to_excel("grouped_non_returnable_transport.xlsx", index=False)
#summary_stats.to_excel("summary_stats_by_basin.xlsx", index=False)
#print("\nData exported to Excel files successfully.")
# Box plot for Non-returnable Water Consumption ---
plt.figure(figsize=(12, 6))
sns.boxplot(
data=grouped_2,
x='Water management basin',
y='Non-returnable water consumption',
palette='Blues'
)
plt.title('Box Plot: Non-returnable Water Consumption by Basin')
plt.xlabel('Water Management Basin')
plt.ylabel('Volume (mln m³)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.savefig("boxplot_non_returnable_water.png", dpi=300, bbox_inches='tight')
plt.show()
# Box plot for Transport Losses ---
plt.figure(figsize=(12, 6))
sns.boxplot(
data=grouped_2,
x='Water management basin',
y='Transport losses',
palette='Oranges'
)
plt.title('Box Plot: Transport Losses by Basin')
plt.xlabel('Water Management Basin')
plt.ylabel('Volume (mln m³)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.savefig("boxplot_transport_losses.png", dpi=300, bbox_inches='tight')
plt.show()
Summary Statistics per Basin: Water management basin Non-returnable water consumption_mean \ 0 Ertis Basin 1.55 1 Esil Basin 0.26 2 Nura-Sarysu 0.36 3 Tobol-Torgai 0.12 4 Zhayik-Caspian 1.18 Non-returnable water consumption_min Non-returnable water consumption_max \ 0 0.84 1.84 1 0.19 0.42 2 0.27 0.41 3 0.10 0.14 4 0.28 1.84 Transport losses_mean Transport losses_min Transport losses_max 0 0.09 0.05 0.18 1 0.03 0.02 0.05 2 0.05 0.02 0.11 3 0.01 0.01 0.02 4 0.15 0.05 0.27
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1661907474.py:29: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1661907474.py:46: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(
In [24]:
#Show Time series stacked
# Ensure 'Year' is integer
grouped_2['Year'] = pd.to_datetime(grouped_2['Year'], errors='coerce').dt.year
# --- Pivot for stacked bar ---
pivot_nc = grouped_2.pivot(index='Year', columns='Water management basin', values='Non-returnable water consumption').fillna(0)
pivot_loss = grouped_2.pivot(index='Year', columns='Water management basin', values='Transport losses').fillna(0)
# --- Stacked Bar: Non-returnable Water Consumption ---
pivot_nc.plot(kind='bar', stacked=True, figsize=(14, 6), colormap='Set3')
plt.title('Non-returnable Water Consumption by Basin (Stacked)', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.xticks(rotation=45)
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True, axis='y', linestyle='--', alpha=0.5)
plt.show()
# --- Stacked Bar: Transport Losses ---
pivot_loss.plot(kind='bar', stacked=True, figsize=(14, 6), colormap='Paired')
plt.title('Transport Losses by Basin (Stacked)', fontsize=14)
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.xticks(rotation=45)
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True, axis='y', linestyle='--', alpha=0.5)
plt.show()
How efficiently each basin uses water (Used / Withdrawal) across years.¶
✅ Definitions Efficiency = Total Water Used / Total Water Withdrawal
Computed per year for each basin or use type
In [25]:
#avoid division by zero
EPSILON = 1e-6
In [26]:
# Group by year and basin
grouped_basin_year = df.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used']
].sum().reset_index()
# Calculate efficiency
grouped_basin_year['Efficiency'] = (
grouped_basin_year['Total water used'] / (grouped_basin_year['Total water withdrawal'] + EPSILON)
)
# Pivot for plotting
pivot_basin_eff = grouped_basin_year.pivot(index='Year', columns='Water management basin', values='Efficiency')
# Plot
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(14, 8))
sns.lineplot(data=pivot_basin_eff)
plt.title('Water Use Efficiency Over Time by Basin')
plt.xlabel('Year')
plt.ylabel('Efficiency (Used / Withdrawal)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
In [27]:
#To highlight long-term trends:
pivot_rolling = pivot_basin_eff.rolling(window=3, min_periods=1).mean()
plt.figure(figsize=(14, 8))
sns.lineplot(data=pivot_rolling)
plt.title('Smoothed Water Use Efficiency (3-Year Rolling Avg)')
plt.xlabel('Year')
plt.ylabel('Efficiency (Used / Withdrawal)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
In [28]:
import seaborn as sns
import matplotlib.pyplot as plt
# Pivot table: rows = Year, columns = Basin, values = Efficiency
pivot_eff_heatmap = grouped_basin_year.pivot(index='Year', columns='Water management basin', values='Efficiency')
# Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(
pivot_eff_heatmap,
cmap='YlGnBu', # Color map: Yellow-Green-Blue
annot=True, # Show numbers inside cells
fmt=".2f", # Format efficiency to 2 decimal places
linewidths=0.5, # Thin lines between cells
cbar_kws={'label': 'Efficiency (Used / Withdrawal)'}
)
plt.title("Water Use Efficiency Heatmap by Basin and Year", fontsize=16)
plt.ylabel("Year")
plt.xlabel("Water Management Basin")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [29]:
import pandas as pd
import numpy as np
# Safety constant to avoid division by zero
EPSILON = 1e-6
# Group by year and basin
grouped_basin_year = df.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used']
].sum().reset_index()
# Calculate efficiency
grouped_basin_year['Efficiency'] = (
grouped_basin_year['Total water used'] / (grouped_basin_year['Total water withdrawal'] + EPSILON)
)
# Summary table: Avg, Min, Max
summary = (
grouped_basin_year
.groupby('Water management basin')['Efficiency']
.agg(['mean', 'min', 'max'])
.rename(columns={
'mean': 'Avg Efficiency',
'min': 'Min Efficiency',
'max': 'Max Efficiency'
})
)
# Calculate trend (last year - first year)
trend = grouped_basin_year.sort_values('Year').groupby('Water management basin').agg(
first_year=('Efficiency', lambda x: x.iloc[0]),
last_year=('Efficiency', lambda x: x.iloc[-1])
)
summary['Trend Δ'] = (trend['last_year'] - trend['first_year']).round(3)
summary = summary.round(3)
# Sort by Avg Efficiency descending
summary = summary.sort_values(by='Avg Efficiency', ascending=False)
# Display summary
print("📊 Water Use Efficiency Summary by Basin (Sorted by Avg Efficiency)")
display(summary)
📊 Water Use Efficiency Summary by Basin (Sorted by Avg Efficiency)
| Avg Efficiency | Min Efficiency | Max Efficiency | Trend Δ | |
|---|---|---|---|---|
| Water management basin | ||||
| Nura-Sarysu | 0.956 | 0.916 | 0.983 | 0.012 |
| Ertis Basin | 0.937 | 0.885 | 0.975 | -0.029 |
| Zhayik-Caspian | 0.921 | 0.841 | 0.957 | 0.052 |
| Esil Basin | 0.875 | 0.774 | 0.946 | 0.150 |
| Tobol-Torgai | 0.813 | 0.626 | 0.931 | -0.266 |
In [30]:
import matplotlib.pyplot as plt
import seaborn as sns
# Set plot style
sns.set(style='whitegrid')
# Bar plot: Trend Δ per basin
plt.figure(figsize=(8, 5))
sns.barplot(
data=summary.reset_index(),
x='Water management basin',
y='Trend Δ',
palette='coolwarm',
order=summary.index
)
plt.title('Trend in Water Use Efficiency (Last Year - First Year)')
plt.ylabel('Efficiency Trend Δ')
plt.xlabel('Water Management Basin')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2359841970.py:9: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
In [31]:
#Identify basins with positive vs negative trends
summary['Trend Category'] = np.where(summary['Trend Δ'] > 0, 'Improving', 'Declining')
print(summary[['Trend Δ', 'Trend Category']])
Trend Δ Trend Category Water management basin Nura-Sarysu 0.012 Improving Ertis Basin -0.029 Declining Zhayik-Caspian 0.052 Improving Esil Basin 0.150 Improving Tobol-Torgai -0.266 Declining
In [32]:
#Basins with highest average efficiency but declining trend?
high_avg_decline = summary[(summary['Avg Efficiency'] > summary['Avg Efficiency'].median()) & (summary['Trend Δ'] < 0)]
print("Basins with high average efficiency but declining trend:")
print(high_avg_decline)
Basins with high average efficiency but declining trend:
Avg Efficiency Min Efficiency Max Efficiency \
Water management basin
Ertis Basin 0.937 0.885 0.975
Trend Δ Trend Category
Water management basin
Ertis Basin -0.029 Declining
In [33]:
#Summary statistics of efficiency trends
print("Trend Δ descriptive statistics:")
print(summary['Trend Δ'].describe())
Trend Δ descriptive statistics: count 5.000000 mean -0.016200 std 0.154623 min -0.266000 25% -0.029000 50% 0.012000 75% 0.052000 max 0.150000 Name: Trend Δ, dtype: float64
Types of water use Analysis accross each Water management basins¶
Household and Drinking Water Use¶
In [35]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Filter for Household and Drinking
household_df = df[df['Types of water use'] == 'Household and drinking']
# Group by Year and Basin
grouped = household_df.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
# Create pivot tables for line plots
pivot_withdrawal = grouped.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped.pivot(index='Year', columns='Water management basin', values='Total water discharge')
# Prepare summary data for bar chart (total sum per basin)
total_per_basin = grouped.groupby('Water management basin')[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().sort_values(by='Total water withdrawal', ascending=False)
# Set plot style
sns.set(style='whitegrid')
plt.figure(figsize=(20, 18))
# Plot 1: Line - Water Withdrawal
plt.subplot(4, 1, 1)
sns.lineplot(data=pivot_withdrawal)
plt.title('Household and Drinking – Total Water Withdrawal by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 2: Line - Water Used
plt.subplot(4, 1, 2)
sns.lineplot(data=pivot_used)
plt.title('Household and Drinking – Total Water Used by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 3: Line - Water Discharge
plt.subplot(4, 1, 3)
sns.lineplot(data=pivot_discharge)
plt.title('Household and Drinking – Total Water Discharge by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 4: Bar Chart - Total by Basin
plt.subplot(4, 1, 4)
total_per_basin.plot(kind='bar', stacked=True, ax=plt.gca(), colormap='viridis')
plt.title('Cumulative Household and Drinking Water Use by Basin')
plt.ylabel('Total Volume (mln m³)')
plt.xlabel('Water Management Basin')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [36]:
# Calculate efficiency (water used / water withdrawal) safely
grouped['Efficiency'] = grouped['Total water used'] / (grouped['Total water withdrawal'] + 1e-6)
In [37]:
plt.figure(figsize=(12, 6))
sns.lineplot(
data=grouped,
x='Year',
y='Efficiency',
hue='Water management basin',
marker='o'
)
plt.title('Water Use Efficiency Over Time (Household and Drinking)')
plt.ylabel('Efficiency (Used / Withdrawal)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()
In [38]:
#Calculate Summary Stats per Basin
efficiency_summary = grouped.groupby('Water management basin')['Efficiency'].agg(['mean', 'min', 'max']).round(3)
efficiency_summary = efficiency_summary.rename(columns={
'mean': 'Avg Efficiency',
'min': 'Min Efficiency',
'max': 'Max Efficiency'
})
print("Water Use Efficiency (Household and Drinking:")
print(efficiency_summary)
Water Use Efficiency (Household and Drinking:
Avg Efficiency Min Efficiency Max Efficiency
Water management basin
Ertis Basin 0.776 0.674 0.907
Esil Basin 0.756 0.577 0.862
Nura-Sarysu 0.801 0.659 0.921
Tobol-Torgai 0.813 0.749 0.863
Zhayik-Caspian 0.895 0.836 0.974
In [39]:
#Calculate Efficiency Trend (Last year - First year) per Basin
trend = grouped.sort_values('Year').groupby('Water management basin').agg(
first_year_efficiency=('Efficiency', lambda x: x.iloc[0]),
last_year_efficiency=('Efficiency', lambda x: x.iloc[-1])
)
trend['Efficiency Trend Δ'] = (trend['last_year_efficiency'] - trend['first_year_efficiency']).round(3)
print("\nWater Use Efficiency (Household and Drinking:")
print(trend[['Efficiency Trend Δ']])
Water Use Efficiency (Household and Drinking:
Efficiency Trend Δ
Water management basin
Ertis Basin 0.095
Esil Basin 0.209
Nura-Sarysu 0.069
Tobol-Torgai 0.026
Zhayik-Caspian -0.104
In [40]:
#Correlation Analysis between volumes and efficiency
# Calculate correlations within the grouped data (per year and basin)
corr_df = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge', 'Efficiency']]
corr_matrix = corr_df.corr().round(3)
print("\nCorrelation matrix:")
print(corr_matrix)
#Visualize Correlation Matrix with a heatmap
plt.figure(figsize=(6, 5))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix of Water Volumes(Household and Drinking) and Efficiency')
plt.tight_layout()
plt.show()
Correlation matrix:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.995
Total water used 0.995 1.000
Total water discharge 0.801 0.806
Efficiency 0.305 0.381
Total water discharge Efficiency
Total water withdrawal 0.801 0.305
Total water used 0.806 0.381
Total water discharge 1.000 0.303
Efficiency 0.303 1.000
"Production facilities" Water Use Analysis¶
In [41]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Filter for Production Facilities
prod_df = df[df['Types of water use'] == 'Production facilities']
# Group by Year and Basin
grouped_prod = prod_df.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
# Pivot for line plots
pivot_withdrawal = grouped_prod.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped_prod.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped_prod.pivot(index='Year', columns='Water management basin', values='Total water discharge')
# Bar chart summary data
total_per_basin = grouped_prod.groupby('Water management basin')[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().sort_values(by='Total water withdrawal', ascending=False)
# Set style and figure
sns.set(style='whitegrid')
plt.figure(figsize=(20, 18))
# Plot 1: Line - Water Withdrawal
plt.subplot(4, 1, 1)
sns.lineplot(data=pivot_withdrawal)
plt.title('Production Facilities – Total Water Withdrawal by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 2: Line - Water Used
plt.subplot(4, 1, 2)
sns.lineplot(data=pivot_used)
plt.title('Production Facilities – Total Water Used by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 3: Line - Water Discharge
plt.subplot(4, 1, 3)
sns.lineplot(data=pivot_discharge)
plt.title('Production Facilities – Total Water Discharge by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1))
plt.tight_layout()
# Plot 4: Bar Chart - Cumulative Totals
plt.subplot(4, 1, 4)
total_per_basin.plot(kind='bar', stacked=True, ax=plt.gca(), colormap='cividis')
plt.title('Cumulative Production Facility Water Use by Basin')
plt.ylabel('Total Volume (m³)')
plt.xlabel('Water Management Basin')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
In [42]:
# Add efficiency column (used / withdrawal) safely
grouped_prod['Efficiency'] = grouped_prod['Total water used'] / (grouped_prod['Total water withdrawal'] + 1e-6)
# Plot Efficiency Trend Over Years by Basin
plt.figure(figsize=(12, 6))
sns.lineplot(
data=grouped_prod,
x='Year',
y='Efficiency',
hue='Water management basin',
marker='o'
)
plt.title('Water Use Efficiency Over Time (Production Facilities)')
plt.ylabel('Efficiency (Used / Withdrawal)')
plt.xlabel('Year')
plt.legend(title='Basin', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.tight_layout()
plt.show()
In [43]:
#Efficiency of Production Facilities Summary by Basin
efficiency_summary_prod = grouped_prod.groupby('Water management basin')['Efficiency'].agg(['mean', 'min', 'max']).round(3)
efficiency_summary_prod = efficiency_summary_prod.rename(columns={
'mean': 'Avg Efficiency',
'min': 'Min Efficiency',
'max': 'Max Efficiency'
})
print("Efficiency Summary by Basin (Production Facilities):")
print(efficiency_summary_prod)
Efficiency Summary by Basin (Production Facilities):
Avg Efficiency Min Efficiency Max Efficiency
Water management basin
Ertis Basin 0.990 0.952 0.995
Esil Basin 0.949 0.790 0.998
Nura-Sarysu 0.985 0.963 0.996
Tobol-Torgai 0.940 0.864 0.977
Zhayik-Caspian 0.957 0.717 0.996
In [44]:
# Efficiency Trend (Last Year - First Year) by Basin
efficiency_trend_prod = grouped_prod.sort_values('Year').groupby('Water management basin').agg(
first_year_efficiency=('Efficiency', lambda x: x.iloc[0]),
last_year_efficiency=('Efficiency', lambda x: x.iloc[-1])
)
efficiency_trend_prod['Efficiency Trend Δ'] = (efficiency_trend_prod['last_year_efficiency'] - efficiency_trend_prod['first_year_efficiency']).round(3)
print("\nEfficiency Trends by Basin (Production Facilities):")
print(efficiency_trend_prod[['Efficiency Trend Δ']])
Efficiency Trends by Basin (Production Facilities):
Efficiency Trend Δ
Water management basin
Ertis Basin 0.007
Esil Basin 0.129
Nura-Sarysu 0.015
Tobol-Torgai 0.038
Zhayik-Caspian 0.213
In [45]:
## Correlation Matrix & Heatmap of Production Facilities vs Efficiency
corr_df_prod = grouped_prod[['Total water withdrawal', 'Total water used', 'Total water discharge', 'Efficiency']]
corr_matrix_prod = corr_df_prod.corr().round(3)
print("\nCorrelation matrix (Production Facilities):")
print(corr_matrix_prod)
plt.figure(figsize=(6, 5))
sns.heatmap(corr_matrix_prod, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Matrix of Water Volumes and Efficiency (Production Facilities)')
plt.tight_layout()
plt.show()
Correlation matrix (Production Facilities):
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge 0.995 0.997
Efficiency 0.495 0.505
Total water discharge Efficiency
Total water withdrawal 0.995 0.495
Total water used 0.997 0.505
Total water discharge 1.000 0.507
Efficiency 0.507 1.000
Combined Analysis for Agricultural Water Use Types (5 METRICS)¶
In [46]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# List of agricultural water use types
agri_types = [
'Agriculture(agricultural water supply)',
'Agriculture (regular irrigation)',
'Agriculture(estuary irrigation)',
'Agriculture (irrigation of pastures)',
'Agriculture(hayfield bay)'
]
# Store total usage for ranking
summary_data = []
# Loop through each agricultural water use type
for use_type in agri_types:
df_type = df[df['Types of water use'] == use_type]
# Group by year and basin
grouped = df_type.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
if grouped.empty:
print(f"Skipping {use_type} – No data found.")
continue
# Calculate total values for ranking
totals = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge']].sum()
total_all = totals.sum()
summary_data.append({
'Use Type': use_type,
'Total Withdrawal': totals['Total water withdrawal'],
'Total Used': totals['Total water used'],
'Total Discharge': totals['Total water discharge'],
'Total Combined': total_all
})
# Prepare pivoted data for plotting (no division)
pivot_withdrawal = grouped.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped.pivot(index='Year', columns='Water management basin', values='Total water discharge')
# Plot
sns.set(style='whitegrid')
plt.figure(figsize=(18, 12))
plt.subplot(3, 1, 1)
sns.lineplot(data=pivot_withdrawal)
plt.title(f'{use_type} – Water Withdrawal by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)') # Label changed only
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.subplot(3, 1, 2)
sns.lineplot(data=pivot_used)
plt.title(f'{use_type} – Water Used by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)') # Label changed only
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.subplot(3, 1, 3)
sns.lineplot(data=pivot_discharge)
plt.title(f'{use_type} – Water Discharge by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)') # Label changed only
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
In [47]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
# List of agricultural water use types
agri_types = [
'Agriculture(agricultural water supply)',
'Agriculture (regular irrigation)',
'Agriculture(estuary irrigation)',
'Agriculture (irrigation of pastures)',
'Agriculture(hayfield bay)'
]
# Store total usage for ranking
summary_data = []
# Loop through each agricultural water use type
for use_type in agri_types:
df_type = df[df['Types of water use'] == use_type]
# Group by year and basin
grouped = df_type.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
if grouped.empty:
continue
# Total for ranking
totals = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge']].sum()
total_all = totals.sum()
summary_data.append({
'Use Type': use_type,
'Total Withdrawal': totals['Total water withdrawal'],
'Total Used': totals['Total water used'],
'Total Discharge': totals['Total water discharge'],
'Total Combined': total_all
})
# Plotting
pivot_withdrawal = grouped.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped.pivot(index='Year', columns='Water management basin', values='Total water discharge')
sns.set(style='whitegrid')
# Create and sort summary DataFrame
summary_df = pd.DataFrame(summary_data).sort_values(by='Total Combined', ascending=False)
summary_df['% Share'] = (summary_df['Total Combined'] / summary_df['Total Combined'].sum() * 100).round(2)
# Display styled table
print("🔝 Summary Ranking by Total Combined Volume (Agricultural Types):")
display(summary_df.style.format({
'Total Withdrawal': '{:,.0f}',
'Total Used': '{:,.0f}',
'Total Discharge': '{:,.0f}',
'Total Combined': '{:,.0f}',
'% Share': '{:.2f}%'
}).background_gradient(subset=['Total Combined'], cmap='YlGnBu'))
# Plot combined bar chart
sns.set(style="whitegrid")
plt.figure(figsize=(10, 6))
barplot = sns.barplot(
data=summary_df,
y='Use Type',
x='Total Combined',
palette='viridis'
)
# Add values to bars
for i, row in summary_df.iterrows():
plt.text(
x=row['Total Combined'] + summary_df['Total Combined'].max() * 0.01,
y=i,
s=f"{row['Total Combined']:,.0f}",
va='center',
fontsize=10
)
plt.title('Total Combined Water Volume by Agricultural Use Types', fontsize=16, weight='bold')
plt.xlabel('Total Volume (mln m³)')
plt.ylabel('Agricultural Use Type')
plt.tight_layout()
plt.show()
🔝 Summary Ranking by Total Combined Volume (Agricultural Types):
| Use Type | Total Withdrawal | Total Used | Total Discharge | Total Combined | % Share | |
|---|---|---|---|---|---|---|
| 4 | Agriculture(hayfield bay) | 13 | 13 | 0 | 26 | 49.01% |
| 1 | Agriculture (regular irrigation) | 6 | 6 | 0 | 12 | 23.21% |
| 2 | Agriculture(estuary irrigation) | 6 | 5 | 0 | 11 | 20.66% |
| 0 | Agriculture(agricultural water supply) | 1 | 1 | 0 | 2 | 4.44% |
| 3 | Agriculture (irrigation of pastures) | 1 | 1 | 0 | 1 | 2.68% |
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\251543407.py:66: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. barplot = sns.barplot(
Water use efficience by Agricultural water use types¶
In [48]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
# Define agricultural water use types
agri_types = [
'Agriculture(agricultural water supply)',
'Agriculture (regular irrigation)',
'Agriculture(estuary irrigation)',
'Agriculture (irrigation of pastures)',
'Agriculture(hayfield bay)'
]
# Initialize containers for summary results
efficiency_summary = []
basin_ranking_data = {}
# Loop through each agricultural type
for use_type in agri_types:
df_type = df[df['Types of water use'] == use_type]
# Group by basin and calculate totals
grouped_basin = df_type.groupby('Water management basin')[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
# Calculate efficiency per basin
grouped_basin['Efficiency (Used/Withdrawal)'] = (
grouped_basin['Total water used'] / grouped_basin['Total water withdrawal']
).replace([float('inf'), -float('inf')], pd.NA)
# Save basin ranking
basin_ranking_data[use_type] = grouped_basin.sort_values(by='Total water used', ascending=False)
# Global summary
total_withdrawal = grouped_basin['Total water withdrawal'].sum()
total_used = grouped_basin['Total water used'].sum()
total_discharge = grouped_basin['Total water discharge'].sum()
efficiency = total_used / total_withdrawal if total_withdrawal > 0 else 0
efficiency_summary.append({
'Use Type': use_type,
'Total Withdrawal': total_withdrawal,
'Total Used': total_used,
'Total Discharge': total_discharge,
'Efficiency (Used/Withdrawal)': round(efficiency, 3)
})
# Create summary DataFrame
efficiency_df = pd.DataFrame(efficiency_summary).sort_values(by='Efficiency (Used/Withdrawal)', ascending=False)
# 🔹 Show summary table with formatting
display(efficiency_df.style
.background_gradient(cmap='YlGnBu')
.format({
'Total Withdrawal': '{:,.0f}',
'Total Used': '{:,.0f}',
'Total Discharge': '{:,.0f}',
'Efficiency (Used/Withdrawal)': '{:.2f}'
}))
| Use Type | Total Withdrawal | Total Used | Total Discharge | Efficiency (Used/Withdrawal) | |
|---|---|---|---|---|---|
| 4 | Agriculture(hayfield bay) | 13 | 13 | 0 | 1.00 |
| 0 | Agriculture(agricultural water supply) | 1 | 1 | 0 | 0.99 |
| 3 | Agriculture (irrigation of pastures) | 1 | 1 | 0 | 0.99 |
| 2 | Agriculture(estuary irrigation) | 6 | 5 | 0 | 0.95 |
| 1 | Agriculture (regular irrigation) | 6 | 6 | 0 | 0.88 |
In [49]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
EPSILON = 1e-6 # to avoid division by zero
# List of agricultural water use types
agri_types = [
'Agriculture(agricultural water supply)',
'Agriculture (regular irrigation)',
'Agriculture(estuary irrigation)',
'Agriculture (irrigation of pastures)',
'Agriculture(hayfield bay)'
]
# Store total usage and efficiency summary for ranking
summary_data = []
for use_type in agri_types:
df_type = df[df['Types of water use'] == use_type]
# Group by year and basin
grouped = df_type.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
if grouped.empty:
print(f"Skipping {use_type} – No data found.")
continue
# Calculate efficiency = Total Used / Total Withdrawal (avoid div by zero)
grouped['Efficiency'] = grouped['Total water used'] / (grouped['Total water withdrawal'] + EPSILON)
# Calculate total values for ranking
totals = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge']].sum()
total_all = totals.sum()
mean_efficiency = grouped['Efficiency'].mean()
summary_data.append({
'Use Type': use_type,
'Total Withdrawal': totals['Total water withdrawal'],
'Total Used': totals['Total water used'],
'Total Discharge': totals['Total water discharge'],
'Total Combined': total_all,
'Mean Efficiency': mean_efficiency
})
# Prepare pivot tables for plotting
pivot_withdrawal = grouped.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped.pivot(index='Year', columns='Water management basin', values='Total water discharge')
pivot_efficiency = grouped.pivot(index='Year', columns='Water management basin', values='Efficiency')
sns.set(style='whitegrid')
plt.figure(figsize=(18, 16))
plt.subplot(4, 1, 1)
sns.lineplot(data=pivot_withdrawal)
plt.title(f'{use_type} – Water Withdrawal by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.subplot(4, 1, 2)
sns.lineplot(data=pivot_used)
plt.title(f'{use_type} – Water Used by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.subplot(4, 1, 3)
sns.lineplot(data=pivot_discharge)
plt.title(f'{use_type} – Water Discharge by Basin')
plt.xlabel('Year')
plt.ylabel('Volume (mln m³)')
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.subplot(4, 1, 4)
sns.lineplot(data=pivot_efficiency)
plt.title(f'{use_type} – Efficiency (Used / Withdrawal) by Basin')
plt.xlabel('Year')
plt.ylabel('Efficiency')
plt.legend(title='Basin', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Statistical summary for efficiency by basin
eff_summary = grouped.groupby('Water management basin')['Efficiency'].agg(['mean', 'min', 'max']).reset_index()
print(f"\nEfficiency Summary for {use_type}:")
print(eff_summary)
# Correlation matrix for water metrics + efficiency
corr_df = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge', 'Efficiency']]
corr_matrix = corr_df.corr()
print(f"\nCorrelation Matrix for {use_type}:")
print(corr_matrix.round(3))
# Correlation heatmap
plt.figure(figsize=(7, 5))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title(f'Correlation Matrix of Water Metrics and Efficiency for {use_type}')
plt.tight_layout()
plt.show()
# Summary DataFrame for ranking
summary_df = pd.DataFrame(summary_data).sort_values(by='Total Combined', ascending=False)
print("\n🔝 Summary Ranking by Total Combined Volume and Mean Efficiency (Agricultural Types):")
print(summary_df)
# Plot bar chart of total combined volume
plt.figure(figsize=(10, 6))
sns.barplot(
data=summary_df,
x='Use Type',
y='Total Combined',
palette='viridis',
ci=None
)
plt.title('Total Water Volume by Agricultural Water Use Type')
plt.ylabel('Total Volume (mln m³)')
plt.xlabel('Agricultural Use Type')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
# Plot bar chart of mean efficiency
plt.figure(figsize=(10, 6))
sns.barplot(
data=summary_df,
x='Use Type',
y='Mean Efficiency',
palette='magma',
ci=None
)
plt.title('Mean Efficiency (Used / Withdrawal) by Agricultural Water Use Type')
plt.ylabel('Mean Efficiency')
plt.xlabel('Agricultural Use Type')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Efficiency Summary for Agriculture(agricultural water supply):
Water management basin mean min max
0 Ertis Basin 0.999494 0.990907 0.999973
1 Esil Basin 0.989536 0.961457 0.999739
2 Nura-Sarysu 0.975181 0.843359 1.003978
3 Tobol-Torgai 0.992066 0.983322 0.999505
4 Zhayik-Caspian 0.975601 0.847841 0.999580
Correlation Matrix for Agriculture(agricultural water supply):
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge 0.025 0.020
Efficiency 0.225 0.236
Total water discharge Efficiency
Total water withdrawal 0.025 0.225
Total water used 0.020 0.236
Total water discharge 1.000 -0.147
Efficiency -0.147 1.000
Efficiency Summary for Agriculture (regular irrigation):
Water management basin mean min max
0 Ertis Basin 0.848226 0.750542 0.971573
1 Esil Basin 0.954000 0.850966 0.999463
2 Nura-Sarysu 0.889445 0.729763 0.977230
3 Tobol-Torgai 0.995431 0.962424 0.999954
4 Zhayik-Caspian 0.918290 0.704022 0.999902
Correlation Matrix for Agriculture (regular irrigation):
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.995
Total water used 0.995 1.000
Total water discharge -0.120 -0.122
Efficiency -0.480 -0.411
Total water discharge Efficiency
Total water withdrawal -0.120 -0.480
Total water used -0.122 -0.411
Total water discharge 1.000 0.122
Efficiency 0.122 1.000
Efficiency Summary for Agriculture(estuary irrigation):
Water management basin mean min max
0 Ertis Basin 0.948294 0.910071 0.962016
1 Esil Basin 0.599772 0.000000 0.999954
2 Nura-Sarysu 0.099990 0.000000 0.999930
3 Tobol-Torgai 0.199893 0.000000 0.999900
4 Zhayik-Caspian 0.884047 0.000000 0.999984
Correlation Matrix for Agriculture(estuary irrigation):
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge 0.005 0.006
Efficiency 0.534 0.537
Total water discharge Efficiency
Total water withdrawal 0.005 0.534
Total water used 0.006 0.537
Total water discharge 1.000 0.093
Efficiency 0.093 1.000
Efficiency Summary for Agriculture (irrigation of pastures):
Water management basin mean min max
0 Ertis Basin 0.991131 0.823409 0.999966
1 Esil Basin 0.539523 0.000000 0.999756
2 Nura-Sarysu 0.000000 0.000000 0.000000
3 Tobol-Torgai 0.000000 0.000000 0.000000
4 Zhayik-Caspian 0.995420 0.958339 0.999953
Correlation Matrix for Agriculture (irrigation of pastures):
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.998
Total water used 0.998 1.000
Total water discharge NaN NaN
Efficiency 0.661 0.670
Total water discharge Efficiency
Total water withdrawal NaN 0.661
Total water used NaN 0.670
Total water discharge NaN NaN
Efficiency NaN 1.000
Efficiency Summary for Agriculture(hayfield bay):
Water management basin mean min max
0 Ertis Basin 0.999995 0.999922 0.999999
1 Esil Basin 0.000000 0.000000 0.000000
2 Nura-Sarysu 0.000000 0.000000 0.000000
3 Tobol-Torgai 0.000000 0.000000 0.000000
4 Zhayik-Caspian 0.299975 0.000000 0.999979
Correlation Matrix for Agriculture(hayfield bay):
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge NaN NaN
Efficiency 0.816 0.816
Total water discharge Efficiency
Total water withdrawal NaN 0.816
Total water used NaN 0.816
Total water discharge NaN NaN
Efficiency NaN 1.000
🔝 Summary Ranking by Total Combined Volume and Mean Efficiency (Agricultural Types):
Use Type Total Withdrawal Total Used \
4 Agriculture(hayfield bay) 12.844193 12.844193
1 Agriculture (regular irrigation) 6.481397 5.685698
2 Agriculture(estuary irrigation) 5.544283 5.280090
0 Agriculture(agricultural water supply) 1.152552 1.144349
3 Agriculture (irrigation of pastures) 0.707426 0.699337
Total Discharge Total Combined Mean Efficiency
4 0.000000 25.688385 0.259994
1 0.000053 12.167148 0.921078
2 0.002843 10.827215 0.546399
0 0.029147 2.326048 0.986376
3 0.000000 1.406763 0.505215
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1700953387.py:115: FutureWarning: The `ci` parameter is deprecated. Use `errorbar=None` for the same effect. sns.barplot( C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1700953387.py:115: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1700953387.py:131: FutureWarning: The `ci` parameter is deprecated. Use `errorbar=None` for the same effect. sns.barplot( C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1700953387.py:131: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
Analyzing by each basin agricultural water use efficiency¶
In [50]:
# Ensure plots show clearly
sns.set(style="whitegrid")
# Plot settings
n_types = len(agri_types)
fig, axes = plt.subplots(n_types, 1, figsize=(12, 5 * n_types))
# Loop and plot for each type
for i, use_type in enumerate(agri_types):
basin_df = basin_ranking_data[use_type].copy()
# Format and sort
basin_df['Efficiency'] = basin_df['Efficiency (Used/Withdrawal)'].round(3)
basin_df.sort_values(by='Efficiency', ascending=False, inplace=True)
# Plot
ax = axes[i] if n_types > 1 else axes
sns.barplot(
data=basin_df,
x='Efficiency',
y='Water management basin',
ax=ax,
palette='viridis'
)
ax.set_title(f"💧 Efficiency by Basin — {use_type}", fontsize=14)
ax.set_xlabel("Efficiency (Used / Withdrawal)")
ax.set_ylabel("Water Management Basin")
ax.set_xlim(0, 1.05)
plt.tight_layout()
plt.show()
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:18: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:18: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:18: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:18: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:18: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
sns.barplot(
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\2057097215.py:30: UserWarning: Glyph 128167 (\N{DROPLET}) missing from font(s) Arial.
plt.tight_layout()
C:\Users\aidos\anaconda3\envs\gee\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128167 (\N{DROPLET}) missing from font(s) Arial.
fig.canvas.print_figure(bytes_io, **kw)
Analyze FOR other water use categories there are 14 TYPES¶
Identify which one has the highest overall water impact (based on withdrawal, use, discharge).¶
Plan:¶
Loop through each Types of water use
For each type:
Group by year and basin
Sum withdrawal, used, discharge
Plot trends
Calculate total usage
At the end, compare total usage of all 14 types to find the most impactful one.
In [52]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Ensure your DataFrame `df` is already defined
# List of other use types
other_types = [
'Pond and fishing industry',
'Watering of green spaces',
'Channel flushing',
'Maintaining set horizons',
'Filling of bulk reservoirs',
'Maintaining reservoir pressure',
'Other needs',
'Resetting the SDR without using',
'Transit water',
'Forced water intake into canals',
'Transferred to another (BWMA)',
'Transferred to another country',
'Discharge of sewage drainage waters',
'Sanitary permits'
]
# Prepare summary data
summary_data = []
for use_type in other_types:
df_type = df[df['Types of water use'] == use_type]
grouped = df_type.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
if grouped.empty:
continue
totals = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge']].sum()
total_all = totals.sum()
summary_data.append({
'Use Type': use_type,
'Total Withdrawal': totals['Total water withdrawal'],
'Total Used': totals['Total water used'],
'Total Discharge': totals['Total water discharge'],
'Total Combined': total_all
})
# Create and sort summary DataFrame
summary_df = pd.DataFrame(summary_data).sort_values(by='Total Combined', ascending=False)
# Optional: Add % share column
summary_df['% Share'] = (summary_df['Total Combined'] / summary_df['Total Combined'].sum() * 100).round(2)
# === PLOT ===
sns.set(style="whitegrid")
plt.figure(figsize=(14, 8))
barplot = sns.barplot(
data=summary_df,
y='Use Type',
x='Total Combined',
palette='crest'
)
# Add values on the bars
for i, row in summary_df.iterrows():
plt.text(
x=row['Total Combined'] + summary_df['Total Combined'].max() * 0.01,
y=i,
s=f"{row['Total Combined']:,.0f}",
va='center',
fontsize=10
)
# Titles and axis labels
plt.title('Total Combined Water Volume by Other Use Types', fontsize=18, weight='bold')
plt.xlabel('Total Volume (mln m³)', fontsize=14)
plt.ylabel('Types of Water Use', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=11)
plt.tight_layout()
plt.show()
from IPython.display import display
print("Summary Ranking by Total Combined Volume:")
display(summary_df.style.format({
'Total Withdrawal': '{:,.0f}',
'Total Used': '{:,.0f}',
'Total Discharge': '{:,.0f}',
'Total Combined': '{:,.0f}',
'% Share': '{:.2f}%'
}).background_gradient(subset=['Total Combined'], cmap='Greens'))
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\3201950326.py:59: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. barplot = sns.barplot(
Summary Ranking by Total Combined Volume:
| Use Type | Total Withdrawal | Total Used | Total Discharge | Total Combined | % Share | |
|---|---|---|---|---|---|---|
| 3 | Maintaining set horizons | 6 | 4 | 0 | 10 | 23.87% |
| 13 | Sanitary permits | 4 | 4 | 1 | 8 | 18.58% |
| 6 | Other needs | 2 | 2 | 1 | 6 | 13.03% |
| 12 | Discharge of sewage drainage waters | 0 | 0 | 5 | 5 | 12.56% |
| 4 | Filling of bulk reservoirs | 2 | 2 | 0 | 5 | 10.83% |
| 0 | Pond and fishing industry | 2 | 1 | 0 | 3 | 7.71% |
| 10 | Transferred to another (BWMA) | 2 | 0 | 0 | 2 | 5.48% |
| 5 | Maintaining reservoir pressure | 1 | 1 | 0 | 2 | 3.71% |
| 7 | Resetting the SDR without using | 1 | 0 | 1 | 2 | 3.65% |
| 1 | Watering of green spaces | 0 | 0 | 0 | 0 | 0.40% |
| 8 | Transit water | 0 | 0 | 0 | 0 | 0.19% |
| 2 | Channel flushing | 0 | 0 | 0 | 0 | 0.00% |
| 9 | Forced water intake into canals | 0 | 0 | 0 | 0 | 0.00% |
| 11 | Transferred to another country | 0 | 0 | 0 | 0 | 0.00% |
In [54]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib.ticker as ticker
EPSILON = 1e-6 # Avoid division by zero
other_types = [
'Pond and fishing industry',
'Watering of green spaces',
'Channel flushing',
'Maintaining set horizons',
'Filling of bulk reservoirs',
'Maintaining reservoir pressure',
'Other needs',
'Resetting the SDR without using',
'Transit water',
'Forced water intake into canals',
'Transferred to another (BWMA)',
'Transferred to another country',
'Discharge of sewage drainage waters',
'Sanitary permits'
]
summary_data = []
for use_type in other_types:
df_type = df[df['Types of water use'] == use_type]
grouped = df_type.groupby(['Year', 'Water management basin'])[
['Total water withdrawal', 'Total water used', 'Total water discharge']
].sum().reset_index()
if grouped.empty:
print(f"Skipping {use_type} – No data found.")
continue
grouped['Efficiency'] = grouped['Total water used'] / (grouped['Total water withdrawal'] + EPSILON)
totals = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge']].sum()
total_all = totals.sum()
mean_efficiency = grouped['Efficiency'].mean()
summary_data.append({
'Use Type': use_type,
'Total Withdrawal': totals['Total water withdrawal'],
'Total Used': totals['Total water used'],
'Total Discharge': totals['Total water discharge'],
'Total Combined': total_all,
'Mean Efficiency': round(mean_efficiency, 3)
})
pivot_withdrawal = grouped.pivot(index='Year', columns='Water management basin', values='Total water withdrawal')
pivot_used = grouped.pivot(index='Year', columns='Water management basin', values='Total water used')
pivot_discharge = grouped.pivot(index='Year', columns='Water management basin', values='Total water discharge')
pivot_efficiency = grouped.pivot(index='Year', columns='Water management basin', values='Efficiency')
sns.set(style='whitegrid', font_scale=1.2)
plt.figure(figsize=(18, 18))
plt.subplot(4, 1, 1)
sns.lineplot(data=pivot_withdrawal, linewidth=2.2)
plt.title(f'{use_type} – Water Withdrawal by Basin', fontsize=15, weight='bold')
plt.ylabel('Volume (mln m³)')
plt.xlabel('Year')
plt.legend(title='Basin', bbox_to_anchor=(1.01, 1), loc='upper left')
plt.subplot(4, 1, 2)
sns.lineplot(data=pivot_used, linewidth=2.2)
plt.title(f'{use_type} – Water Used by Basin', fontsize=15, weight='bold')
plt.ylabel('Volume (mln m³)')
plt.xlabel('Year')
plt.legend(title='Basin', bbox_to_anchor=(1.01, 1), loc='upper left')
plt.subplot(4, 1, 3)
sns.lineplot(data=pivot_discharge, linewidth=2.2)
plt.title(f'{use_type} – Water Discharge by Basin', fontsize=15, weight='bold')
plt.ylabel('Volume (mln m³)')
plt.xlabel('Year')
plt.legend(title='Basin', bbox_to_anchor=(1.01, 1), loc='upper left')
plt.subplot(4, 1, 4)
sns.lineplot(data=pivot_efficiency, linewidth=2.2)
plt.title(f'{use_type} – Efficiency (Used / Withdrawal) by Basin', fontsize=15, weight='bold')
plt.ylabel('Efficiency')
plt.xlabel('Year')
plt.legend(title='Basin', bbox_to_anchor=(1.01, 1), loc='upper left')
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()
eff_summary = grouped.groupby('Water management basin')['Efficiency'].agg(['mean', 'min', 'max']).round(3).reset_index()
print(f"\n📌 Efficiency Summary for: {use_type}")
print(eff_summary)
corr_df = grouped[['Total water withdrawal', 'Total water used', 'Total water discharge', 'Efficiency']]
corr_matrix = corr_df.corr().round(3)
print(f"\n🔍 Correlation Matrix for {use_type}:")
print(corr_matrix)
plt.figure(figsize=(7, 5))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title(f'Correlation Matrix of Water Metrics and Efficiency – {use_type}', fontsize=13)
plt.tight_layout()
plt.show()
# --- Final summary ---
summary_df = pd.DataFrame(summary_data).sort_values(by='Total Combined', ascending=False)
print("\n📊 Summary Ranking by Total Combined Volume and Mean Efficiency:")
print(summary_df)
# Bar chart: Total Volume
plt.figure(figsize=(14, 6))
sns.barplot(data=summary_df, x='Use Type', y='Total Combined', palette='rocket')
plt.title('Total Water Volume by Other Use Types', fontsize=16, weight='bold')
plt.ylabel('Total Volume (mln m³)')
plt.xlabel('Types of Water Use')
plt.xticks(rotation=45, ha='right')
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f'{int(x):,}'))
plt.tight_layout()
plt.show()
# Bar chart: Mean Efficiency
plt.figure(figsize=(14, 6))
sns.barplot(data=summary_df, x='Use Type', y='Mean Efficiency', palette='viridis')
plt.title('Mean Efficiency (Used / Withdrawal) by Other Use Types', fontsize=16, weight='bold')
plt.ylabel('Efficiency')
plt.xlabel('Types of Water Use')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
📌 Efficiency Summary for: Pond and fishing industry
Water management basin mean min max
0 Ertis Basin 0.650 0.000 1.000
1 Esil Basin 0.998 0.975 1.000
2 Nura-Sarysu 0.994 0.983 0.999
3 Tobol-Torgai 0.996 0.989 0.999
4 Zhayik-Caspian 0.814 0.602 0.999
🔍 Correlation Matrix for Pond and fishing industry:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.978
Total water used 0.978 1.000
Total water discharge 0.578 0.451
Efficiency -0.095 -0.061
Total water discharge Efficiency
Total water withdrawal 0.578 -0.095
Total water used 0.451 -0.061
Total water discharge 1.000 -0.083
Efficiency -0.083 1.000
📌 Efficiency Summary for: Watering of green spaces
Water management basin mean min max
0 Ertis Basin 0.980 0.723 0.999
1 Esil Basin 0.797 0.408 1.000
2 Nura-Sarysu 0.888 0.548 0.961
3 Tobol-Torgai 0.000 0.000 0.000
4 Zhayik-Caspian 0.925 0.371 1.000
🔍 Correlation Matrix for Watering of green spaces:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.969
Total water used 0.969 1.000
Total water discharge 0.091 0.115
Efficiency 0.227 0.279
Total water discharge Efficiency
Total water withdrawal 0.091 0.227
Total water used 0.115 0.279
Total water discharge 1.000 0.129
Efficiency 0.129 1.000
📌 Efficiency Summary for: Channel flushing
Water management basin mean min max
0 Ertis Basin 0.000 0.0 0.000
1 Esil Basin 0.048 0.0 0.959
2 Nura-Sarysu 0.000 0.0 0.000
3 Tobol-Torgai 0.000 0.0 0.000
4 Zhayik-Caspian 0.000 0.0 0.000
🔍 Correlation Matrix for Channel flushing:
Total water withdrawal Total water used \
Total water withdrawal 1.0 1.0
Total water used 1.0 1.0
Total water discharge NaN NaN
Efficiency 1.0 1.0
Total water discharge Efficiency
Total water withdrawal NaN 1.0
Total water used NaN 1.0
Total water discharge NaN NaN
Efficiency NaN 1.0
📌 Efficiency Summary for: Maintaining set horizons
Water management basin mean min max
0 Ertis Basin 0.719 0.000 1.0
1 Esil Basin 0.000 0.000 0.0
2 Nura-Sarysu 0.000 0.000 0.0
3 Tobol-Torgai 0.000 0.000 0.0
4 Zhayik-Caspian 0.706 0.427 1.0
🔍 Correlation Matrix for Maintaining set horizons:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.997
Total water used 0.997 1.000
Total water discharge 0.107 0.090
Efficiency 0.547 0.576
Total water discharge Efficiency
Total water withdrawal 0.107 0.547
Total water used 0.090 0.576
Total water discharge 1.000 0.287
Efficiency 0.287 1.000
📌 Efficiency Summary for: Filling of bulk reservoirs
Water management basin mean min max
0 Ertis Basin 0.000 0.0 0.000
1 Esil Basin 0.046 0.0 0.922
2 Nura-Sarysu 0.784 0.0 0.963
3 Tobol-Torgai 0.000 0.0 0.000
4 Zhayik-Caspian 0.893 0.0 1.000
🔍 Correlation Matrix for Filling of bulk reservoirs:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.991
Total water used 0.991 1.000
Total water discharge 0.007 0.002
Efficiency 0.578 0.583
Total water discharge Efficiency
Total water withdrawal 0.007 0.578
Total water used 0.002 0.583
Total water discharge 1.000 0.188
Efficiency 0.188 1.000
📌 Efficiency Summary for: Maintaining reservoir pressure
Water management basin mean min max
0 Ertis Basin 0.000 0.000 0.0
1 Esil Basin 0.000 0.000 0.0
2 Nura-Sarysu 0.000 0.000 0.0
3 Tobol-Torgai 0.000 0.000 0.0
4 Zhayik-Caspian 0.996 0.967 1.0
🔍 Correlation Matrix for Maintaining reservoir pressure:
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge 0.428 0.428
Efficiency 0.839 0.839
Total water discharge Efficiency
Total water withdrawal 0.428 0.839
Total water used 0.428 0.839
Total water discharge 1.000 0.431
Efficiency 0.431 1.000
📌 Efficiency Summary for: Other needs
Water management basin mean min max
0 Ertis Basin 0.600 0.000 1.0
1 Esil Basin 0.941 0.000 1.0
2 Nura-Sarysu 0.893 0.706 1.0
3 Tobol-Torgai 0.540 0.000 1.0
4 Zhayik-Caspian 0.868 0.751 1.0
🔍 Correlation Matrix for Other needs:
Total water withdrawal Total water used \
Total water withdrawal 1.000 0.991
Total water used 0.991 1.000
Total water discharge 0.454 0.559
Efficiency 0.266 0.302
Total water discharge Efficiency
Total water withdrawal 0.454 0.266
Total water used 0.559 0.302
Total water discharge 1.000 0.366
Efficiency 0.366 1.000
📌 Efficiency Summary for: Resetting the SDR without using
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Resetting the SDR without using:
Total water withdrawal Total water used \
Total water withdrawal 1.000 NaN
Total water used NaN NaN
Total water discharge 0.998 NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal 0.998 NaN
Total water used NaN NaN
Total water discharge 1.000 NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Transit water
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Transit water:
Total water withdrawal Total water used \
Total water withdrawal 1.0 NaN
Total water used NaN NaN
Total water discharge 1.0 NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal 1.0 NaN
Total water used NaN NaN
Total water discharge 1.0 NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Forced water intake into canals
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Forced water intake into canals:
Total water withdrawal Total water used \
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Transferred to another (BWMA)
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Transferred to another (BWMA):
Total water withdrawal Total water used \
Total water withdrawal 1.0 NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Transferred to another country
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Transferred to another country:
Total water withdrawal Total water used \
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Discharge of sewage drainage waters
Water management basin mean min max
0 Ertis Basin 0.0 0.0 0.0
1 Esil Basin 0.0 0.0 0.0
2 Nura-Sarysu 0.0 0.0 0.0
3 Tobol-Torgai 0.0 0.0 0.0
4 Zhayik-Caspian 0.0 0.0 0.0
🔍 Correlation Matrix for Discharge of sewage drainage waters:
Total water withdrawal Total water used \
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge NaN NaN
Efficiency NaN NaN
Total water discharge Efficiency
Total water withdrawal NaN NaN
Total water used NaN NaN
Total water discharge 1.0 NaN
Efficiency NaN NaN
📌 Efficiency Summary for: Sanitary permits
Water management basin mean min max
0 Ertis Basin 0.091 0.0 0.99
1 Esil Basin 0.000 0.0 0.00
2 Nura-Sarysu 0.300 0.0 1.00
3 Tobol-Torgai 0.000 0.0 0.00
4 Zhayik-Caspian 0.837 0.0 1.00
🔍 Correlation Matrix for Sanitary permits:
Total water withdrawal Total water used \
Total water withdrawal 1.000 1.000
Total water used 1.000 1.000
Total water discharge 0.203 0.204
Efficiency 0.672 0.669
Total water discharge Efficiency
Total water withdrawal 0.203 0.672
Total water used 0.204 0.669
Total water discharge 1.000 0.396
Efficiency 0.396 1.000
📊 Summary Ranking by Total Combined Volume and Mean Efficiency:
Use Type Total Withdrawal Total Used \
3 Maintaining set horizons 5.828032 4.345899
13 Sanitary permits 3.733421 3.704657
6 Other needs 2.288972 2.075181
12 Discharge of sewage drainage waters 0.000000 0.000000
4 Filling of bulk reservoirs 2.414623 2.204548
0 Pond and fishing industry 1.717320 1.375639
10 Transferred to another (BWMA) 2.353435 0.000000
5 Maintaining reservoir pressure 0.765183 0.764867
7 Resetting the SDR without using 0.793128 0.000000
1 Watering of green spaces 0.090817 0.081905
8 Transit water 0.040120 0.000000
2 Channel flushing 0.000024 0.000024
9 Forced water intake into canals 0.000000 0.000000
11 Transferred to another country 0.000000 0.000000
Total Discharge Total Combined Mean Efficiency
3 0.074591 10.248521 0.285
13 0.536176 7.974254 0.246
6 1.227525 5.591678 0.768
12 5.390308 5.390308 0.000
4 0.029205 4.648376 0.345
0 0.217036 3.309995 0.890
10 0.000000 2.353435 0.000
5 0.060814 1.590864 0.199
7 0.771726 1.564854 0.000
1 0.000731 0.173453 0.718
8 0.040120 0.080240 0.000
2 0.000000 0.000047 0.010
9 0.000000 0.000000 0.000
11 0.000000 0.000000 0.000
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1016445377.py:116: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(data=summary_df, x='Use Type', y='Total Combined', palette='rocket')
C:\Users\aidos\AppData\Local\Temp\ipykernel_11696\1016445377.py:127: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(data=summary_df, x='Use Type', y='Mean Efficiency', palette='viridis')
In [ ]: